Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices

نویسندگان

  • Heiga Zen
  • Yannis Agiomyrgiannakis
  • Niels Egberts
  • Fergus Henderson
  • Przemyslaw Szczepaniak
چکیده

Acoustic models based on long short-term memory recurrent neural networks (LSTM-RNNs) were applied to statistical parametric speech synthesis (SPSS) and showed significant improvements in naturalness and latency over those based on hidden Markov models (HMMs). This paper describes further optimizations of LSTM-RNN-based SPSS for deployment on mobile devices; weight quantization, multi-frame inference, and robust inference using an -contaminated Gaussian loss function. Experimental results in subjective listening tests show that these optimizations can make LSTM-RNN-based SPSS comparable to HMM-based SPSS in runtime speed while maintaining naturalness. Evaluations between LSTM-RNNbased SPSS and HMM-driven unit selection speech synthesis are also presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic Modeling in Statistical Parametric Speech Synthesis – from Hmm to Lstm-rnn

Statistical parametric speech synthesis (SPSS) combines an acoustic model and a vocoder to render speech given a text. Typically decision tree-clustered context-dependent hidden Markov models (HMMs) are employed as the acoustic model, which represent a relationship between linguistic and acoustic features. Recently, artificial neural network-based acoustic models, such as deep neural networks, ...

متن کامل

Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis

Building text-to-speech (TTS) systems requires large amounts of high quality speech recordings and annotations, which is a challenge to collect especially considering the variation in spoken languages around the world. Acoustic modeling techniques that could utilize inhomogeneous data are hence important as they allow us to pool more data for training. This paper presents a long short-term memo...

متن کامل

Speech Synthesis Based on Hidden Markov Models and Deep Learning

Speech synthesis based on Hidden Markov Models (HMM) and other statistical parametric techniques have been a hot topic for some time. Using this techniques, speech synthesizers are able to produce intelligible and flexible voices. Despite progress, the quality of the voices produced using statistical parametric synthesis has not yet reached the level of the current predominant unit-selection ap...

متن کامل

Comparison of Modeling Target in LSTM-RNN Duration Model

Speech duration is an important component in statistical parameter speech synthesis(SPSS). In LSTM-RNN based SPSS system, the speech duration affects the quality of synthesized speech in two aspects, the prosody of speech and the position features in acoustic model. This paper investigated the effects of duration in LSTM-RNN based SPSS system. The performance of the acoustic models with positio...

متن کامل

E-PUR: An Energy-Efficient Processing Unit for Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are a key technology for emerging applications such as automatic speech recognition, machine translation or image description. Long Short Term Memory (LSTM) networks are the most successful RNN implementation, as they can learn long term dependencies to achieve high accuracy. Unfortunately, the recurrent nature of LSTM networks significantly constrains the amoun...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016